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■ Abstract 

In recent work [T] we uncovered intriguing connections between Otto's characterisation 
of diffusion as entropic gradient flow |16j on one hand and large-deviation principles 
describing the microscopic picture (Brownian motion) on the other. In this paper, we 
sketch this connection, show how it generalises to a wider class of systems, and comment 
£NJ , on consequences and implications. 

■ Specifically, we connect macroscopic gradient flows with large deviation principles, 
and point out the potential of a bigger picture emerging: we indicate that in some non- 

p I ■ equilibrium situations, entropies and thermodynamic free energies can be derived via 

\ large deviation principles. The approach advocated here is different from the established 

hydrodynamic limit passage but extends a link that is well known in the equilibrium 
situation. 
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1 Introduction 



For systems in equilibrium, it is well known that the roles of energy and entropy can be 
understood rigorously in terms of large-deviation principles. We describe two examples below. 
Recently, we showed how large-deviation principles also allow us to understand the role of 
entropy in a specific non- equilibrium system pQ: the large-deviation behaviour of a system of 
independent Brownian particles connects rigorously to the entropy gradient-flow structure of 
' the diffusion equation. We explain this connection in Section 13.11 

CN ■ The aim of this paper is to take this connection two steps further. The first step is to 

extend the connection of p], which was studied in a discrete-time context, to the case of 
continuous time. The second step is to discuss a variety of examples that illustrates the 
^ ■ breadth of this phenomenon, and suggest a general principle that might hold across a wide 

^ . range of systems. 

In equilibrium systems, the connection is as follows. Let Xi (i = 1, 2, . . . ) be independent 
and identically distributed stochastic variables with distribution /x on a state space X. We 
think of the X{ as positions of particles in the space X, so that their concentration is given 
by the empirical measure p n := rX^Li^Xi- Sanov's theorem (e.g., [3 Sec. 6.2]) states that 
the random measure p n satisfies the large- deviation principle 

Prob(p„ ps p) ~ exp[— nl(p)], as n — > oo, (1) 
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where the rate function I > is the relative entropy of p with respect to p, which is 
I(p) = H(p\p) :-- 



f f log / dp if p < p and p = //x, 
+00 otherwise, 



This property illustrates how the relative entropy H{p\p) characterises the probability of 
observing a state p: higher relative entropy means smaller probability, as described by ([1]). 
It also provides a rigorous version of the well-known thermodynamic principle that a system 
aims to maximise its entropy (which corresponds to minimising H(p\p), since the physical 
entropy carries the opposite sign). For in the limit of large n, the characterisation ([1]) gives 
vanishing probability to all states p except those for which I(p) = 0; in other words, only the 
minimisers of / have non-vanishing probability. 

This connection between entropy and large-deviation principles extends to systems in- 
volving energy. In the appendix we show, for instance, how coupling a system with en- 
ergy E to a heat bath with temperature changes the rate functional / to the free energy 
J-(p) : = H(p\p) + (k6)- l E(p) + constant: 

Prob(p n p) ~ exp[— nF{p)}, as re — > 00. (2) 

In the same way as (H|) explains why relative entropy is minimized, ([2]) explains why systems 
coupled to a heat bath minimize their free energy: when n is large, only states p with near- 
minimal free energy J-{p) will have finite probability. 

As mentioned above, the central aim of this paper is to show how this connection between 
entropy and free energies on one hand and large-deviation principles on the other extends 
into the realm of non-equilibrium systems. We restrict our focus to the important class of 
gradient flows, where this connection explains many aspects of these systems. Since the 
entropy appears as the driving force of the process, we will occasionally call this functional 
"energy" to conform with the standard terminology for gradient flows. 

The general philosophy is illustrated by the diagram below. 

dynamic rate functional this paper gradient-flow structure 

I or I h J or I h 



large-deviation principle 
n— >oo 

, , . . , continuum limit , . , , . , . 

stochastic re-particle system > continuum evolution equation 



(3) 



The bottom row in this diagram is the classical connection between a stochastic n-particle 
system and its hydrodynamic limit: the typical case is that as n — > 00, the particle system 
becomes deterministic, and the empirical measure of the particle system converges to the 
solution of the (deterministic) continuum equation. Note that this statement concerns only 
the typical behaviour of the particle system; large deviations are not captured. 

In the left-hand column, a large-deviation principle characterises the behaviour in the 
limit re —7- 00 in a different manner, in terms of a functional / or 1^ of the time- dependent 
system, as we shall see below. The right-hand column is the connection between an evolution 
equation and the corresponding gradient-flow structure, when it exists. 

The central statement of this paper is the double-headed arrow at the top. It provides 
a connection between representations with more information on both sides: on the left-hand 
side, the rate functional contains more information than just the most probable behaviour, 
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and on the right-hand side, the gradient-flow structure is an additional structure on top of 
the equation itself. 

In the following sections, we illustrate the double-headed arrow in a number of concrete 
examples, first in the discrete-time approximation (Section [3]) and then in continuous time 
(Section H|). Section [5] generalises the argument to non-quadratic dissipations. Since the 
implications of this connection are best appreciated once one has an overview of the breadth 
of the phenomenon, we postpone most of the discussion of the consequences to Section [6) 

The mathematical results described in this paper are not new, and mostly due to other 
authors, such as Freidlin & Wentzell [10], Dawson & Gartner [SJ [6], Feng & Kurtz [9], Kipnis, 
Olla, <fc Varadhan [12] and others. Instead, we see the novelty of this paper in extracting 
from these results the suggestion of a general principle connecting the broad class of gradient 
flows with large deviations of stochastic processes. A particularly interesting aspect of this 
connection is that thermodynamic quantities are derived in a non-equilibrium context. 



2 The Wasserstein metric 

Much of this paper centres on the Wasserstein metric and Wasserstein gradient flows. The 
(quadratic) Wasserstein distance between two probability measures po and p\ with finite 
second moments is [18] 

d(po,Pi) 2 = inf / \x - y\ 2 q(dxdy), (4) 

1 JR d xR d 

where the infimum is taken over all q with marginals po and pi, i.e., over all q satisfying 

for any ,4 C M d , q(A x R d ) = p (A) and q(M. d x A) = Pl (A). 

We also need an incremental version of the Wasserstein distance. The Brenier-Benamou 
formula [3j gives an alternative formulation of d as an infimum of curves of measures 1 1— >■ p{t) 
such that p(0) = po and p(l) = p\\ 



d(p ,piY= inf / \\d t p(t)r p(t> dt. (5) 

p: [0,l]^Mi(R d ) J PKh 

Here the local norm || • |L * at a given point p is derived from an inner product (a local metric 
tensor) formally given by 



(si,s 2 ) p ,* ■= / p(x)Vpi(x) ■ Vp 2 (x)dx, (6) 

where V is the usual gradient in R d , and the pi solve the equation div (pVpi) = Si in M. d 
(see [3 [13] or [9[ Sec. 9.4] for a rigorous definition). 

A Wasserstein gradient flow is a gradient flow of an energy £ with respect to the Wasser- 
stein metric structure. A curve of measures t i— > p(t) is a solution of such a gradient-flow 
equation if its time derivative dtp, in the sense of distributions, satisfies 



(d t p(t), S2) p (t) * = - / y- (p(t)) s 2 dx for all s 2 and all t > 0, 

JR d Op 



(7) 



3 



where b£ /bp is the variational derivative of £. A straightforward calculation shows that this 
is equivalent to the equation 

d t p = dwpV(^). (8) 



.bp) 

By analogy with gradients in Riemannian geometry, this suggests to define the Wasserstein 
gradient of a functional £ as 

grad w £{p) := - div pV (^-) . (9) 



bp) 

Below we shall also use more general versions of this structure. Replacing p above by a 
general diffusion matrix D(p), we define 

(si,s 2 )d( p ),* ■= / D(p(x))Vpi(x) ■ Vp 2 {x) dx, where Si = div D(p)X7pi. (10) 

Repeating the construction above, it follows that the D- Wasserstein gradient of a functional £ 
is characterised by the equation 

dtp = div D{p)v{^-). (11) 

Gradient flows have natural time-discrete approximations, constructed in an iterative 
manner: 

For given approximation pt~i at time (k — l)h, choose pk at time kh 

as minimiser of the functional p i— > ~^d{p, p k ~ 1 ) 2 + £(p)- (12) 

This is essentially a backward-Euler discretisation, as can be recognised by comparing it with 
the R d -gradient-flow x = —X7E(x). For this equation the backward-Euler discretisation is 
constructed by solving 

-(x k - x k -i) = -VE(x k ), 
for Xk, which is equivalent to minimising 

x^ ^\x-x k ^\ 2 + E{x). (13) 

Note the similarity between (|13|) and (|12|) : in both expressions the first term measures the 
distance between old and new states, while the second term favours a reduction of the func- 
tional £ respectively E. 

3 Discrete time 

We can now formulate the first example. 
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3.1 A system of independent Brownian particles 

We consider n independent Brownian particles X n i(t) in M. d , with deterministic initial posi- 
tions X nt i(0) = x Ut i, each hopping to a new position X n< i(h) at time h > with a Gaussian 
probability with mean x n ^ and varianc 

As in the equilibrium case discussed above, we describe this system by the empirical 
measure p n (t) '■= ^Y^=i^x n t (t) a t a given time t, and we assume that the initial measure 
p n {0) converges to a given measure p° as n — > oo. In the limit of large n, the probability 
of this jump process attaining any p 1 at time t = h is again characterised in terms of a 
large-deviation principle, 

Prob(p n (h) « p 1 ) » expt-nlfc^ 1 )], (14) 

where the rate functional Ih has an explicit expression that can be derived from Stirling's 
formula (see pQ for the expression; in pQ, Ih is only the limit of a sequence of rate functionals, 
but can be shown to be a rate functional in its own right |14t 117]). 
The main result of [T] is that 

I h *K h ash^ 0, (15) 

where 

K h (p\p Q ) := ^(pV) 2 + ^Ent^ 1 ) - ^Ent(p°). (16) 
Here d is the Wasserstein distance defined above, and 

~r~i j / \ m , n ///log/dx if p < £ and p = /£, 
Ent(p) := H(p\C) = < J 

I +oo otherwise, 

is the relative entropy of p with respect to the Lebesgue measure C. The rigorous formulation 
of (|15p is a Gamma-convergence result of Ih to -fT/j after both have been desingularised. 

The functional Kh has the same form as the functional in (|12p . since the term Ent(p°)/2 
does not influence the minimisation with respect to p . Therefore the time-discrete approxi- 
mation that one constructs with this Kh is an approximation of the Wasserstein gradient flow 
of the entropy Ent, which is the diffusion equation |11] 

d t p = Ap in R d . (17) 

This is the connection referred to above: the large-deviation behaviour of the system of 
particles is represented by the rate functional Ih, and this functional is asymptotically equal 
to the functional Kh that defines the gradient-flow formulation of the diffusion equation. The 
approximation result (|15p therefore creates a link between the gradient-flow structure of the 
deterministic limit equation on one hand and the large-deviation behaviour of the system 
of particles on the other. The same result can be shown for Gaussian measures on the real 
line [8]. In the rest of this paper we shall see many more versions of such connections. 

1 In this paper, we consider Brownian particles with generator A, rather than (1/2) A, and therefore the 
transition kernel is (4nh)~ d ^ 2 exp — \x — y\' 2 /Ah. 
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Consequences While most of the discussion is deferred to Section [6l we mention here a few 
consequences of the fact (|15|) that the large-deviation rate functional Ih and the constructing 
functional Kh of the gradient flow are equal in the limit h — > 0. 

First, the construction of a time-discrete approximation (|12|) to the diffusion equation (|17l) 
was motivated in by analogy with the backward-Euler discretisation (|13p . This is an 
indirect and purely mathematical motivation, which explains neither the reason for the ap- 
pearance of the entropy and the Wasserstein distance in Kh, nor the reason for minimising 
just this combination. 

The connection between Kh and Ih, however, gives a direct motivation. By (|14p - (|15l) . 
Kh(p~, p°) is a measure of the likelihood of observing a state p after time h. For large n, 
the characterisation (|14p implies that only the global minimiser of Ih, and therefore of Kh, is 
observed with non- vanishing probability. The stochastic minimisation (|14[) of Ih thus becomes 
converted into an absolute minimisation of Kh- 

Secondly, in the limit h — > 0, the proof that Ih ~ Kh explains the origin of the two terms 
of Kh- The entropy arises from the indistinguishibility of the particles after transforming to 
an empirical measure. The origin of the Wasserstein cost functional \x — y\ 2 in can be 
traced back to the exponent of the term e~\ x ~ y \ / Ah in the Gaussian transition probability of 
the Brownian particles. We return to this issue in Section [6j 

4 Continuous time 

The construction in the previous section is discrete in time: the rate function Ih describes the 
probability distribution of the state p n {h) at time h > 0. A continuous-time large-deviation 
principle, where one considers deviations from a whole path of empirical measures for a fixed 
terminal time, provides a different kind of insight, and may be even closer to the gradient-flow 
formulation. We start with some preliminaries. 

4.1 An alternative formulation of the gradient-flow structure 

In a formal sense, Wasserstein gradient flows and many others can be written in the form 

d t p = -M p — , (18) 
dp 

where £ is the 'energy' functional driving the evolution, and M p a /O-dependent symmetric 
mapping^. In the case of Wasserstein gradient flows, for instance, 

M p £ = -divpV£, 

as follows by comparing (|8|) with (|18p . Taking this case of Wasserstein gradient flow as an 
example, we shall encounter the equation (|18p in a different form, connected to the functional 
J given by 

J(p) := £ (p(T)) - £(p(0)) + \ J 

2 This way of writing the gradient flow highlights the fact that a gradient flow is an instance of a GENERIC 
evolution, in which the conservative evolution term is absent [TS] . 





6£ 


2 


ll<VHp,* + 


~T P 


P 



dt, 



(19) 
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where 



Iieii2== 



^Mpidx 



and the norm || • is the norm defined in (El). The norms 



and 



IP,* 



are dual norms, 



and 



* has the alternative characterisation 



/ 



s£ dx 



\\8\\p * :— sup ■ 

\\Qp 

By writing the energy difference £(p(T)) — £(p(0)) as 
£(p(T))-£(p(0)) = 



— dtp dxdt 



o 



<5£ 

m p — dt, 



p.* 



using the inner product defined in ([?])■ the functional J in (|19p can now be written as 



J(p) 



T 



dtp + M p 



S£ 

T P 



dt. 



This expression shows that J is non-negative. It also implies that if p satisfies J(p) = 0, then 
equation (fT8|) holds at almost each time < t < T; therefore 



p is a Wasserstein gradient flow of £ 



J(p) = 0. 



(20) 



In the examples of this paper, J is a large-deviation rate functional, and this equivalence 
is the connection between the large-deviation behaviour, given by J, and the gradient-flow 
structure of the limiting equation. 

If we take for the operator M p in (|18j) not the Wasserstein operator but a general operator, 
then we find a similar statement: 



p is a solution of the {£ , M p )-gradient-flow (fT8 



where 



J M (p) :=£(p(T))- £(p(0)) + 



1 







SS 


2 


f 

/o 


\\dtp\? M -, + 


~T P 


M p _ 



Jm(p) = o, 

dt, 



(21) 
(22) 



and the two norms are defined, at least formally, by 

CM p Cdx, 



U\\ 2 Mo 



\ M -i :=sup 



dx 



u\\ 



Mo 



sM~ l sdx = IIM7 1 s\\\ A 



We now discuss a number of examples. 
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4.2 Continuous-time large deviations for the diffusion equation 

Taking the same system of particles as in Section 13.11 the continuous-time large-deviation 
principle for that system of Brownian particles is as follows. Fix a terminal time T > and 
consider the whole path [0,T] — > Aii(M. d ) of empirical measures [0, T] 3ti-} p n (t)- Then the 
probability that the entire curve p n (-) is close to some other p(-) is characterised as [3II3] as 
a pathwise large- deviation principle, 

Prob(p n « p) ~ exp[-n/(p)], 

where now 



I(p) :- 



1 



\dtp-&p\\ p (t),* dt. 



(23) 



This rate function / has the structure of J in (|19p . Using the fact that 

'<5Ent\ 

Ap = div pV ( 



5p /' 



we find that 



I(p) = Ent(p(T)) - Ent(p(0)) + - 



T r 



\ d tP\ 



Pt* 



+ 



«5Ent 



5p 



dt. 



Therefore the Entropy- Wasserstein gradient flow is connected to the large-deviation behaviour 
of a system of stochastic particles, in the sense of (|20p . We discuss this further in Section [6J 



4.3 Diffusive particles with interactions 

We extend the previous example by including interaction of the particles with a background 
potential and with each other via an interaction potential and modelled by ltd stochas- 
tic differential equations. Specifically, we take the microscopic system of n particles to be 
described by 

1 n 

dXi{t) = -V#pQ(i)) dt - - ^ V$(*i(t) - Xj{t)) dt + V2 dWi(t), (24) 



where for each i, Wi is a Brownian motion in K d . The hydrodynamic limit of this system is 
the equation 

d t p = Ap + div pV + p * $] . (25) 

The large-deviation rate functional describing fluctuations of the system is given by (see [9l 
Theorem 13.37], and also [5] for weakly interacting diffusive particle systems) 



:= o / d tP - Ap - div pV [* + p * <£>] 



dt. 



(26) 



which again can be written as 



I(p)=F(p(T))-T(p(0)) + - 



\dtP 



1 2 

I p,* 



5p 



where the free energy T is given by the sum of entropy and potential energy, 

1 



T{p) := Ent(p) + f 
Indeed equation ()25p is the Wasserstein gradient flow of the functional J 7 . 



p^ + -p(p*$) 



(27) 
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4.4 The Symmetric Simple Exclusion Process 

The diffusion equation (jTTJ) is the continuum iimit for various stochastic processes, one of 
which is the system of Brownian particles described above. Here we briefly describe the 
symmetric simple exclusion process, which has the same limiting equation in a parabolic 
scaling. However, it has a different large-deviation behaviour, which gives rise to a different 
gradient flow. 

Consider a periodic lattice T n = {0, 1/n, 2/n, . . . (n — 1)/^} and its continuum limit, the 
flat torus T = R/Z. Each lattice site contains zero or one particle; each particle attempts to 
jump from to a neighbouring site with rate n 2 /2, and they succeed if the target site is empty. 
We define the configuration p n : T n — > {0, 1} such that p n (k/n) = 1 if there is a particle at 
site k/n, and zero otherwise. For this system the large deviations are characterised by the 
rate function [T2] 

I (p)-=\j a \\dtp-d xx p\\ 2 p(1 _ p> dt, (28) 
\p(i-p),* i s given by (fTOj) with D(p) = p(l — p). This functional can be 



where the norm 
written as 



I(p) = Ent mix (p(T)) - Ent mix (p(0)) + - 



p(i-p),* 



+ 



5 Ent m i x 


2 -i 


Sp 


p(l-p)- 



dt, 



where the mixing entropy Ent m i x is defined as 

Ent mix (p) := / [plogp+(l-p)log(l-p)]. 
This is true since —d xx p is the 'p(l — p) '-Wasserstein gradient of Ent m ; x , by 



d xx p = -d x p{\ - p)d x log 



P 



d x p(l - p)d. 



5Ent r 



-(p) 



1-pJ --y-,- r ,-~ §p 
(compare this to (jlip ). Therefore I is of the form (|22p . with operator 

M p i :=divp(l-p)Ve, 

and the equation dtp = d xx p is (also) the gradient flow of Ent m i x with respect to this 'p(l — p)'- 
Wasserstein structure || • HpM-p),*. 

5 Further generalisations 

The arguments of the integrals in ([5]), (|23p . (|26p . and (|28p are quadratic. This arises from 
a parabolic rescaling and the central limit theorem, and it leads to a gradient flow with a 
(formal) inner-product structure, or equivalently, to a linear operator M p in (fT8|h Other types 
of randomness lead to non-quadratic gradient-flow structures, as we now describe. 

A close inspection of the arguments of Section T4. 1 1 shows that they hinge on the inequality 



1 



d t £( P ) > --\\a t p 



5£_ 

T P 



Mo 
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together with the observation that equality holds if and only if dtp = —M p 5£/5p. This 
can be generalised by introducing a Legendre pair of convex functions ip p and ip*, where the 
subscript p serves to indicate that they may depend on p, in the same way as the operator M p 
does; in this context, ip p , is often called dissipation potential. In terms of this pair we then 
derive that 

d t £( P (t)) = l^d tP >-r P id t p)-^ p {^), 

and equality holds if and only if 

dtp € dij; p (-j-y (29) 
The case of the M-gradient flow ([29j) corresponds to 

r p (0-=lU\\M p and M°)--=l\\4 2 M -i- 
The obvious generalisation of (|20p then is 

p is a solution of the (^^-gradient-flow (f29|) 
where is given by 

J^p) :=£(p(T))-£(p(0)) + [ 

Jo 



(30) 



r p (d t p) + A 



b£_ 
Tp 



tit. 



(31) 



5.1 Birth-death processes 

A simple example of a stochastic process with non-quadratic dissipation tp and a corresponding 
generalised gradient flow is a birth-death process, which is a continuous-time jump process 
on Z. The system may only jump to neighbours, from position k with rate a& to k + 1 and 
with rate bk to k — 1. We construct a continuum limit by defining the new stochastic variable 
U n by rescaling time t and position k(t) with n: 

U n (t) := k -^. 

n 

A standard argument gives the large-deviation behaviour for U n in terms of the rate functional 
(see [1] for a finite-lattice proof of the claims made below). If we choose the jump rates so 
that 

a k = ae - £ '^ and b k = ae +£ '^ 
for a > and some smooth function £ : M — > R, then the rate functional is 

I{u) = [ L(u(t),u'(t))dt, 
Jo 

with 

LM = v l og v ± ^ + J a " - V^T4^+ ae~ £ 'M + ae+ £ 'H 
2aexp(— o'{u)) 

Writing 

v + Vv 2 + 4a 2 



^ f u j = v log V + 4a z , 

2a 
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it follows that = a ( e ^ + e )> an d ^ can be written in the form (|31D . 
The corresponding generalised gradient flow in R, given by (|29p . reads 

u = 2sinh(— £ (u)). 

Observe how this differs from the standard (quadratic-dissipation) gradient flow, which is 
u = —£'(u); the non-quadratic dissipation preserves the sign of the velocity, but not its 
amplitude. Because of the preservation of sign, the energy £ is monotonic along a solution: 

—£(u(t)) = £'(u(t))u(t) = 2£'(u)smh(-£'(u)) < 0. 
at 

This example shows how the connection between large-deviation principles and (gener- 
alised) gradient flows extends to the case of non-quadratic dissipations. Note that here the 
large deviations refer to a single process and henceforth are not due to an averaging process 
as in the empirical measure case. 



5.2 Spin-flip processes 

For n E N, let T n be the one-dimensional n-torus (Z/raZ). An Ising spin at sites of T n takes 
values in {— 1,+1} and is subject to a rate-1 independent spin-flip dynamics. We consider 
the trajectory of the magnetisation, i.e., t i— > m n (t) = ^ J2 i£ j n &i(t), where ai(t) is the spin 
at site i € T n at time t. The generator for the process (m n (t))t>o is given by 

(A n f)(m) = il±^n[/(m - 2n~ l ) - f(m)} + { ±-H^ n [f(m + 2n~ l ) - f{m)\ 

for m E { — 1, — l + 2n~ 1 , . . . , 1}. The trajectory of the magnetisation satisfies a large deviation 
principle, i.e., for every trajectory 7 = (jt)te[o,T], 

Prob((m n (t)) te r 0)T] « (7t)te[o,Tl) ~ exp -n L(^ u <y t )dt 

L Jo 

where the Lagrangian L can be computed following the scheme of Feng and Kurtz [9l Exam- 
ple 1.5.]. We obtain 



Tf \ < l^ { q + Vq 2 + 4(i-m 2 ) \ 1 , 2 . 2T , 1 

M m > ?) = 2 lo S ( 2 ^ _ ) ~2^ q ^ ~ ^ 

This can similarly be written as ip*(q) + ip(—£'(m)) + q£'(m), where 



$ (?) = 2 lo § 2Vl - m 2 2 ^ 9 ^ ~ ' 

and 

V>(6 = ^Vl-m 2 (exp(2£) + exp(-20); 

the involved energy is 

£(m) = —(1 + m) log(l + m) + — (1 — m) log(l — m). 

Then the limiting equation (|29p can be written as m = —2m. This is consistent with the 
optimal trajectory via the Euler-Lagrange equation, m(t) = moe~ 2t . 
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6 Discussion 



In the sections above we have described a number of pairs of systems, each consisting of 
a stochastic process and its continuum limit. Each pair has the property that the large 
deviations of the stochastic process are closely linked to a gradient-flow structure of the limit 
equation. These links are time-dynamic versions of the equilibrium connection mentioned in 
the introduction. We now describe how this provides us insight into the properties of the 
gradient-flow structures for each pair. 

6.1 Wasserstein gradient flows 

We claim that the Wasserstein metric characterises the mobility of the empirical measure of a 
large number of Brownian particles. Indeed, this claim can be made meaningful in a number 
of different ways: 

1. In discrete time, letting p n be the empirical measure of a system of Brownian particles, 
we have 

Prob(p n (/i) « p l \p n (0) « P°) ~ e- nh <P ^ 2 / 4 as n -> oo, 
which follows from (|16|) and was proved independently in |14j . 

2. In continuous time, for the whole path p n : [0, T] — > A4i(M. d ) of empirical measures up 
to a fixed terminal time T, we have 

Prob(/9 n m p) ~ e~ nI( - p \ as n —?■ oo, 

where /, defined in (|23p . measures the size of the deviation by the Wasserstein metric 
tensor || • \\ p . 

3. When the particles also undergo a deterministic drift, the same statement holds with / 
defined by ()26[) . where again the size of the deviation is measured by the norm || • |L. 

The origin of this role of the Wasserstein metric as the mobility of Brownian particles can 
be understood by considering the geometric relationship between (R d )" and the space of 
measures endowed with the Wasserstein distance. Consider the embedding 

n 

e: (K d ) n ->(.Mi(R d ),d), • • • ,x„) H- - VX- 

n 

i=l 

Note that e is not one-to-one, since the numbering of the particles is lost: the particles 
have become indistinguishable. Indeed, one can identify the set of empirical measures of 
the form n _1 ^ S x . with the space obtained by identifying all elements in (K rf ) n that are 
rearrangements of each other, i.e., the quotient space (M. d ) n /S n , where S n is the set of all 
permutations of n elements. 

Now the Wasserstein metric on A4i makes the embedding of (M. d ) n /S n in A4i(M. d ) iso- 
metric. This follows from the simple property that 

^E^E^j = LsEh-^)l 2 - (32) 
i=i j=i / i=i 
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With this property the role of the Wasserstein distance can be fully explained. The Freidlin- 
Wentzell theory for Brownian particles [TU] shows how the mobility of a vector X = (Xi, ■ ■ ■ , X n ) 
of n Brownian particles has a stochastic mobility given by the Euclidean norm (x, y) i— > 



The loss of information upon introducing indistinguishability, or equivalently upon transform- 
ing to empirical measures, implies by the contraction principle (e.g., Sec. 4.2.1]) that the 
exponent (1/4) £^ \xi — yi\ 2 becomes replaced by its minimum under rearrangement, 



This expression is equal to n/(4/i) times ()32|) . If we gloss over the approximations in different 
limits (h — > and n — > oo), this explains how the Wasserstein distance is the natural measure 
of the mobility of an empirical measure of Brownian particles, through transformation of the 
original mobility of a single Brownian particle. 

6.2 Consequences for modelling 

Gradient flows can be thought of as overdamped systems, in the sense that any inertial 
effects are damped out quickly by the effects of viscous, frictional, or other damping forces, 
and can therefore be neglected. One way of modelling such overdamped systems is therefore 
by assuming an abstract gradient-flow structure from the start and making it concrete by 
postulating an energy £ and a dissipation potential ip. These choices should be motivated, 
and in the case of Wasserstein and Wasserstein-like dissipations this motivation is non-trivial. 

One area where this is particularly visible is in the modelling of lower-dimensional struc- 
tures, such as threads and surfaces, moving through a viscous fluid. The biology of sub-cell 
structures knows many such examples, including microtubules and lipid bilayers. The assump- 
tion of over damp edness is reasonable in this viscosity-dominated situation, but the interplay 
of geometry and mechanics makes the direct formulation of evolution equations complicated 
and error-prone (see, e.g., [2]). In this context, the construction of evolution equations through 
the postulation of energy and dissipation is often simpler and allows for clearer separation of 
the various assumptions. However, it remains necessary to motivate the choices made for the 
energy and the dissipation. 

To take the Wasserstein metric as an example, its interpretation as the measure of mobility 
of empirical measures of Brownian particles provides such a motivation, and because of the 
connection to the Brownian mobility of the particles it also allows for generalisation to other 
situations. 

But similar arguments apply to other dissipations, coupled to other underlying stochastic 
processes. For instance, the symmetric simple exclusion process leads to p(l — p) mobility, 
implying that if such an exclusion process is one's idea of the underlying system, then the 
p(l — ^-dissipation is the natural choice. 

One might go even further. The diffusion equation (|1T|) is known to be a gradient flow 
in many different ways; in addition to the two mentioned above, also as the L 2 -gradient flow 



Yli I x i ~ Vi 1 2 > i n the sense that 



1 

Prob(X(/i) « y\X(Q) « x) ~ exp -— £ 



%i — Vi\ 2 for small h. 




i=l 
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of the Dirichlet integral J |V/o| 2 , for instance, as the ff~ 1 -gradient flow of the L 2 -norm, and 
even as the .fP -1 -gradient flow of the .fP-seminorm for each s £ M. For the two structures 
that we have discussed, the different underlying stochastic processes provide clear reasons for 
the differing dissipations and energies. Here we formulate the 

Conjecture 1 Each gradient-flow structure can be connected to an appropriate stochastic 
process via a large- deviation principle. 

To the extent that this conjecture turns out to be true, it provides an explanation for the 
occurrence of multiple gradient-flow formulations of the same differential equation. 



6.3 Geometry and reversibility 



There are interesting connections between the geometry of the Brownian noise, the reversibil- 
ity of the stochastic process, and the question whether the resulting evolution equation is a 
gradient flow or not. 

This becomes apparent when we modify the system of Section 14.31 by introducing a dif- 
fusion matrix A £ M. dxd and replacing the scalar a by a mobility matrix o G M rfxrf , thus 
obtaining 



1 n 

dXi{t) = -AW(Xi(t))dt - -^2AV<S>(Xi(t) - X j (t))dt + V2adWi(t). 

The large-deviation rate functional of the system is similarly given by 
,T 2 



(33) 



I{P) 



If 

'■= o / ®tP ~ div a(jT ^ 1 P - div pAV + p * $] 
2 Jo 



dt, 



(34) 



where the norm || • 1 1 is induced by (|10p with D(p) = poo 1 '. The formula (|34p implies 
that the hydrodynamic limit of this system is the minimiser of /, satisfying 



d t p = div oo T Vp + div pAV [\P + p * $] 



(35) 



With this additional parameter freedom, it is not always possible to write (|34|) in the 
form ()22p . This depends on whether the cross term in ()34|) is an exact differential, i.e., 
whether there exists a functional £ such that 



{d t p, - div oo T Vp - div pA V [\& + p * $ 



D{p),* 



d t £{ P ). 



This is the case if and only if oo T is a positive multiple of A, a condition that is familiar 
from the fluctuation-dissipation theorem. In that case, and writing oo T = kTA for some 
'temperature' T > and the Boltzmann constant k, 

b~J~ 

- div oo T Vp - div pAV + p * $1 = M — , 

1 J op 

where M p £ is defined as — div D(p)\/^ and the free energy J 7 is a modification of ([2 



T(p) := Ent(p) + / 
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Then the rate functional / can be written in the form (|19p as 
I{p) = T(p(T)) - T(p(0)) + \J 

and the evolution equation ([55]) is the (modified, D-) Wasserstein gradient flow of T . 

Our freedom to choose A and a separately gives us the insight that for this system the 
following four statements are equivalent: 

1. aa T = kTA for some T > 0; 

2. The evolution (j35[) is a D(p)-Wasserstein gradient flow of T\ 

3. The rate functional / can be written in the form (| 1 9 () ; 

4. For any finite number n of particles, the system (|33|) is reversible. 

We expect that such an equivalence property, including the reversibility of the microscopic 
system, might hold more generally. 

6.4 Diffusion with decay 

Yet another generalisation concerns systems with decay, which is implemented as a jump 
process. In [T7], Peletier and Renger have derived a similar connection for the case of dif- 
fusing particles that are convected and may also decay, given by the equation (in one space 
dimension) 

dtp = dxxP + d x (p d x y) - Xp, (36) 

with * G C 6 2 (R) and A > 0. 

In [17] , the particles perform a Brownian motion in the spatial dimension, augmented by 
a deterministic drift given by — d x $?. This part of the process gives rise to the two terms 
d X xP + d x (pd x *f>). In addition, the particles change their state from 'normal' to 'decayed', 
after an exponentially distributed time; this part gives rise to the term — Xp. The opposite 
transition is not allowed: decay is irreversible. 

An analysis similar to Section 13.11 then connects the large-deviation rate functional for 
this stochastic particle system to a corresponding minimisation problem describing the time- 
discrete evolution, i.e., the equivalent of (|12p . In this case the time-discrete minimisation 
problem is 

/eargmin inf -\Hp + Pnd) ~ ^(p* -1 ) + jrd(j> + Pnd, p k ~ l ? 

p PND-\P+PND\ = \p k \ Z / 4n 

+ F{p) + T{p ND ) - \p\ \oge~ Xh -\ PND \ log(l - e~ Xh ), (37) 
where \p\ '■= J p and the free energy T is defined as 

F{p) = Ent(p) + j ^dp. 

In [T7], the authors explain how the structure of (|37p can be understood: if we define 

kUp^p"- 1 ) ■■= \ht>) -\Hp k ~ l ) + ^dfop*- 1 ?, 

K^ ec (p;p) := T(p)-T(p)+T(p-p) - \p\loge- Xh + \p - p\ log(l - e~ Xh ), 



\9tp\ 



+ 



5JF 


2 


8p 


Dip) 



dt 
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then the terms inside the infimum in ([37j) can be written as K h (p + pnd] p k ~ l ) + K[)ec(P'i P + 
Pnd)- In this decomposition, the first term describes diffusion and convection by ^ of the 
joint measure p + Pad starting from the previous state p , similar to (fl"6|) and (fT2|) . The 
second term describes the decay process, in which the joint diffused-and-convected measure 
P + Pnd is split into a part p that remains 'normal' and the remainder p^o that becomes 
decayed. 

While the structure of (j37j) is not the same as (|12|) . and (j37|) does not represent a time 
discretisation of a gradient flow, both are minimisation problems that define the next step in 
the iteration, and in both cases one can identify a driving force (the free energy J 7 , in the case 
of (|37p ) and a mechanism that acts as a brake. In the 'brake' is the Wasserstein metric 
d(p, p^ 1 ) 2 /4h, and in K^ ec it is the two terms —\p\ loge~ Xh + \p — p\ log(l — e~ Xh ). In both 
cases these terms restrict the movement of ~p respectively p, and this restriction becomes more 
and more severe as h — > 0. 

6.5 General remarks on interacting particle systems 

Section [5] explained how, once a large deviation principle for the interacting particle system 
with rate functional I{p) is established, different Wasserstein-type metrics occur in a natural 
way. Such large deviation results are stronger than results on limit equations. Indeed, a part 
of the standard proof of a large deviation result involves modifying the process by adding a 
forcing such that a given path which does not solve the original limit equation solves the limit 
equation of the modified process. So the question arises whether the point of view advocated 
in this paper has the potential of deriving limit equations without using large deviation results 
which contain limit results derived in the classical way. This open question is of particular 
importance because limit points of the implicit time discretisation provide a weak notion of 
solution of the limit equation in cases where distributional solutions are not appropriate, e.g., 
for problems with a sharp interface like the mean curvature flow. In situations such as (|24j) . 
where a particle interacts with the average of many others, the distribution of a family of 
initially independent particles stays close to a product measure (propagation of chaos), so a 
modification of the techniques for independent particles seems promising. 

7 Conclusion 

The examples of this paper illustrate how the two concepts of large-deviation principles for 
stochastic particle systems and gradient flows are closely entwined. Further examples are 
currently under study, such as Brownian particles with inertia, which lead to the Kramers' 
equation, and rate-independent systems such as friction and fracture. We expect that many 
more examples of this kind will be uncovered. 

A Free energy and the Boltzmann distribution 

In this appendix we show how the free energy 

F{p) := H(p\p) + ±E(p) (38) 

arises from the coupling of a system of particles with a heat bath. Here > (in Joules) is the 
temperature of the heat bath, and the Boltzmann constant k has the value 1.4 • 10 -23 J/K. 
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The measure p G V{X) is the probability distribution of the particles in a state space X, 
and E is the average energy of the particles: 



E{p) = / e{x)p{dx), 



x 

where e: X — > R is a fixed function that we call the energy of a state x € X. We now 
construct an explicit system in which J- arises as the large-deviation rate functional. This 
will allow us to interpret all these concepts in the context of large deviations. 

We start by choosing a system S and its connection to a heat bath called Sb- Both 
are probabilistic systems of particles; S consists of n independent particles Xi G X, with 
probability law p G V(X); similarly Sb consists of m independent particles Yj G y, with law 
v G V(y). The total state space of the system is therefore X n x y m . 

The coupling between these systems is done via an energy constraint. We assume that 
there are energy functions e: X — > R and es '■ y — > R, and we will constrain the joint system 
to be in a state of fixed total energy, i.e., we will only allow states in X n x y m that satisfy 

n m 

^2 e ( x i) + ^2 e ^( y i) = constant. ( 39 ) 

i=l j=l 

The physical interpretation of this is that energy (in the form of heat) may flow freely from 
one system to the other, but no other form of interaction is allowed. 

Similar to the example in the Introduction, we describe the total states of systems S and 
Sb by empirical measures p n = ^ ^ 6x t and Cm = ^ Ylj $Yj- We define the average energies 
E(p n ) ■= £ Yji e ( X i) = Sx ed Pn and E B (Cm) ■= J y ^3 d( m , so that the energy constraint ([39]) 
reads nE{p n ) + mEs^Cm) = constant. 

By Sanov's theorem each of the systems separately satisfies a large-deviation principle 
with rate functions I(p) = H(p\p) and Ib(C) = T^iCW)- However, instead of using the explicit 
formula for Ib, we are going to assume that Ib can be written as a function of the energy 
Eb of the heat bath alone, i.e., Ib(C) = Ib{Eb{Q)- For the coupled system we derive a joint 
large-deviation principle by choosing that (a) m = nN for some large N > 0, and (b) the 
constant in (|39|) scales as nN, i.e., 



nE{p n ) + nNEs (CrUv) = nNE for some E. 
Formally, the joint system then satisfies a large-deviation principle 

Prob((p n , Cun) ~ (p, C) | E(p n ) + NE b {Cun) = ~ exp(-nJ(p, C)), 



with rate functional 
J(P,0 :=\ 



n(p\p) + NI B (E B (0) + constant if E{p) + NE B {Q = NE, 
+oo otherwise. 



Here the constant is chosen to ensure that inf J = 0. 

The functional J can be reduced to a functional of p alone, 

J(p) = U(p\p) + NI B \ E - ^j^j + constant. 
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In the limit of large N, one might approximate 

NIb (e - « NI B (E) - l' B (E)E(p). 

The first term above is absorbed in the constant, and we find 

J{p) « H(p\n) - E(p)f' B (E) + constant. 

We expect that I' B is negative, since larger energies typically lead to higher probabilities and 
therefore smaller values of I B . Now we simply define k6 := — 1/I' B (E), and we find 

J(p) « U{p\p) + —E(p) + constant. 

This is the same expression as (j38|) . Note that the right-hand side can be written as T-L{p\p), 
where p is the tilted distribution 



HA) 



e -e(x)/kO 
A 

[ e- e ^ k9 p(dx) 
Jx 



This derivation shows that the effect of the heat bath is to tilt the system S: a state p 
of S with larger energy E(p) implies a smaller energy E B of S B , which in turn reduces the 
probability of p. This is reflected in the approximation I' B {E)E{p) of I B (C)- The role of 
temperature 9 is that of an exchange rate, since it characterises the change in probability (as 
measured by the rate function I B ) per unit of energy. When 6 is large, the exchange rate 
is low, and then larger energies incur only a small probabilistic penalty. When temperature 
is low, then higher energies are very expensive, and therefore more rare. From this point 
of view, the Boltzmann constant k is simply the conversion factor that converts our Kelvin 
temperature scale for 6 into the appropriate 'exchange rate' scale. 

In thermodynamics one often encounters the identity (or definition) 9 = dS/dE. This is 
formally the same as our definition of kO as —dI B /dE, if one interprets I B as an entropy and 
adopts the convention to multiply the non-dimensional quantity I B with —k. 
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